Name | Version | Summary | date |
codebleu |
0.7.0 |
Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI. |
2024-05-30 10:32:09 |
langsmith |
0.1.64 |
Client library to connect to the LangSmith LLM Tracing and Evaluation Platform. |
2024-05-30 09:45:18 |
AutoRAG |
0.2.2 |
Automatically Evaluate RAG pipelines with your own data. Find optimal structure for new RAG product. |
2024-05-30 03:03:03 |
opencompass |
0.2.5 |
A comprehensive toolkit for large model evaluation |
2024-05-29 16:35:35 |
coconut-develop |
3.1.0.post0.dev15 |
Simple, elegant, Pythonic functional programming. |
2024-05-29 04:33:32 |
dyff-client |
0.8.0 |
Python client for the Dyff AI auditing platform. |
2024-05-25 00:53:20 |
dyff-schema |
0.9.1 |
Data models for the Dyff AI auditing platform. |
2024-05-25 00:05:54 |
agenta |
0.14.14 |
The SDK for agenta is an open-source LLMOps platform. |
2024-05-24 22:36:25 |
promptbench |
0.0.3 |
PromptBench is a powerful tool designed to scrutinize and analyze the interaction of large language models with various prompts. It provides a convenient infrastructure to simulate **black-box** adversarial **prompt attacks** on the models and evaluate their performances. |
2024-05-22 23:05:24 |
dyff-audit |
0.4.0 |
Audit tools for the Dyff AI auditing platform. |
2024-05-22 20:53:48 |
torcheval-nightly |
2024.5.21 |
A library for providing a simple interface to create new metrics and an easy-to-use toolkit for metric computations and checkpointing. |
2024-05-21 12:10:57 |
jury |
2.3.1 |
Evaluation toolkit for neural language generation. |
2024-05-20 08:28:12 |
phasellm |
0.0.22 |
Wrappers for common large language models (LLMs) with support for evaluation. |
2024-05-18 22:48:40 |
dyff |
0.19.0 |
Meta-package to install the local SDK for the Dyff AI auditing platform. |
2024-05-17 19:35:53 |
uptrain |
0.7.1 |
UpTrain - tool to evaluate LLM applications on aspects like factual accuracy, response quality, retrieval quality, tonality, etc. |
2024-05-14 09:19:40 |
redlite |
0.2.0 |
LLM testing on steroids |
2024-05-10 17:31:30 |
promptmodel |
0.1.19 |
Prompt & model versioning on the cloud, built for developers. |
2024-05-10 02:36:18 |
evo |
1.28.0 |
Python package for the evaluation of odometry and SLAM |
2024-05-09 10:33:54 |
langcheck |
0.7.1 |
Simple, Pythonic building blocks to evaluate LLM-based applications |
2024-05-08 14:45:03 |
trajectopy |
2.0.14 |
Trajectory Evaluation in Python |
2024-05-08 10:36:42 |